34 research outputs found

    Neuromorphic deep convolutional neural network learning systems for FPGA in real time

    Get PDF
    Deep Learning algorithms have become one of the best approaches for pattern recognition in several fields, including computer vision, speech recognition, natural language processing, and audio recognition, among others. In image vision, convolutional neural networks stand out, due to their relatively simple supervised training and their efficiency extracting features from a scene. Nowadays, there exist several implementations of convolutional neural networks accelerators that manage to perform these networks in real time. However, the number of operations and power consumption of these implementations can be reduced using a different processing paradigm as neuromorphic engineering. Neuromorphic engineering field studies the behavior of biological and inner systems of the human neural processing with the purpose of design analog, digital or mixed-signal systems to solve problems inspired in how human brain performs complex tasks, replicating the behavior and properties of biological neurons. Neuromorphic engineering tries to give an answer to how our brain is capable to learn and perform complex tasks with high efficiency under the paradigm of spike-based computation. This thesis explores both frame-based and spike-based processing paradigms for the development of hardware architectures for visual pattern recognition based on convolutional neural networks. In this work, two FPGA implementations of convolutional neural networks accelerator architectures for frame-based using OpenCL and SoC technologies are presented. Followed by a novel neuromorphic convolution processor for spike-based processing paradigm, which implements the same behaviour of leaky integrate-and-fire neuron model. Furthermore, it reads the data in rows being able to perform multiple layers in the same chip. Finally, a novel FPGA implementation of Hierarchy of Time Surfaces algorithm and a new memory model for spike-based systems are proposed

    Dynamic Vision Sensor integration on FPGA-based CNN accelerators for high-speed visual classification

    Get PDF
    Deep-learning is a cutting edge theory that is being applied to many fields. For vision applications the Convolutional Neural Networks (CNN) are demanding significant accuracy for classification tasks. Numerous hardware accelerators have populated during the last years to improve CPU or GPU based solutions. This technology is commonly prototyped and tested over FPGAs before being considered for ASIC fabrication for mass production. The use of commercial typical cameras (30fps) limits the capabilities of these systems for high speed applications. The use of dynamic vision sensors (DVS) that emulate the behavior of a biological retina is taking an incremental importance to improve this applications due to its nature, where the information is represented by a continuous stream of spikes and the frames to be processed by the CNN are constructed collecting a fixed number of these spikes (called events). The faster an object is, the more events are produced by DVS, so the higher is the equivalent frame rate. Therefore, these DVS utilization allows to compute a frame at the maximum speed a CNN accelerator can offer. In this paper we present a VHDL/HLS description of a pipelined design for FPGA able to collect events from an Address-Event-Representation (AER) DVS retina to obtain a normalized histogram to be used by a particular CNN accelerator, called NullHop. VHDL is used to describe the circuit, and HLS for computation blocks, which are used to perform the normalization of a frame needed for the CNN. Results outperform previous implementations of frames collection and normalization using ARM processors running at 800MHz on a Zynq7100 in both latency and power consumption. A measured 67% speedup factor is presented for a Roshambo CNN real-time experiment running at 160fps peak rate.Comment: 7 page

    Efficient DMA transfers management on embedded Linux PSoC for Deep-Learning gestures recognition: Using Dynamic Vision Sensor and NullHop one-layer CNN accelerator to play RoShamBo

    Get PDF
    This demonstration shows a Dynamic Vision Sensor able to capture visual motion at a speed equivalent to a highspeed camera (20k fps). The collected visual information is presented as normalized histogram to a CNN accelerator hardware, called NullHop, that is able to process a pre-trained CNN to play Roshambo against a human. The CNN designed for this purpose consist of 5 convolutional layers and a fully connected layer. The latency for processing one histogram is 8ms. NullHop is deployed on the FPGA fabric of a PSoC from Xilinx, the Zynq 7100, which is based on a dual-core ARM computer and a Kintex-7 with 444K logic cells, integrated in the same chip. ARM computer is running Linux and a specific C++ controller is running the whole demo. This controller runs at user space in order to extract the maximum throughput thanks to an efficient use of the AXIStream, based of DMA transfers. This short delay needed to process one visual histogram, allows us to average several consecutive classification outputs. Therefore, it provides the best estimation of the symbol that the user presents to the visual sensor. This output is then mapped to present the winner symbol within the 60ms latency that the brain considers acceptable before thinking that there is a trick.Ministerio de Economía y Competitividad TEC2016-77785-

    Spiking row-by-row FPGA Multi-kernel and Multi-layer Convolution Processor.

    Get PDF
    Spiking convolutional neural networks have become a novel approach for machine vision tasks, due to the latency to process an input stimulus from a scene, and the low power consumption of these kind of solutions. Event-based systems only perform sum operations instead of sum of products of framebased systems. In this work an upgrade of a neuromorphic event-based convolution accelerator for SCNN, which is able to perform multiple layers with different kernel sizes, is presented. The system has a latency per layer from 1.44 μs to 9.98μs for kernel sizes from 1x1 to 7x7

    System based on inertial sensors for behavioral monitoring of wildlife

    Get PDF
    Sensors Network is an integration of multiples sensors in a system to collect information about different environment variables. Monitoring systems allow us to determine the current state, to know its behavior and sometimes to predict what it is going to happen. This work presents a monitoring system for semi-wild animals that get their actions using an IMU (inertial measure unit) and a sensor fusion algorithm. Based on an ARM-CortexM4 microcontroller this system sends data using ZigBee technology of different sensor axis in two different operations modes: RAW (logging all information into a SD card) or RT (real-time operation). The sensor fusion algorithm improves both the precision and noise interferences.Junta de Andalucía P12-TIC-130

    A Sensor Fusion Horse Gait Classification by a Spiking Neural Network on SpiNNaker

    Get PDF
    The study and monitoring of the behavior of wildlife has always been a subject of great interest. Although many systems can track animal positions using GPS systems, the behavior classification is not a common task. For this work, a multi-sensory wearable device has been designed and implemented to be used in the Doñana National Park in order to control and monitor wild and semiwild life animals. The data obtained with these sensors is processed using a Spiking Neural Network (SNN), with Address-Event-Representation (AER) coding, and it is classified between some fixed activity behaviors. This works presents the full infrastructure deployed in Doñana to collect the data, the wearable device, the SNN implementation in SpiNNaker and the classification results.Ministerio de Economía y Competitividad TEC2012-37868-C04-02Junta de Andalucía P12-TIC-130

    Interfacing PDM sensors with PFM spiking systems: application for Neuromorphic Auditory Sensors

    Get PDF
    In this paper we present a sub-system to convert audio information from low-power MEMS microphones with pulse density modulation (PDM) output into rate coded spike streams. These spikes represent the input signal of a Neuromorphic Auditory Sensor (NAS), which is implemented with Spike Signal Processing (SSP) building blocks. For this conversion, we have designed a HDL component for FPGA able to interface with PDM microphones and converts their pulses to temporal distributed spikes following a pulse frequency modulation (PFM) scheme with an accurate configurable Inter-Spike-Interval. The new FPGA component has been tested in two scenarios, first as a stand-alone circuit for its characterization, and then it has been integrated with a full NAS design to verify its behavior. This PDM interface demands less than 1% of a Spartan 6 FPGA resources and has a power consumption below 5mW.Ministerio de Economía y Competitividad TEC2016-77785-

    Live Demonstration: Neuromorphic Row-by-Row Multi-convolution FPGA Processor-SpiNNaker architecture for Dynamic-Vision Feature Extraction

    Get PDF
    In this demonstration a spiking neural network architecture for vision recognition using an FPGA spiking convolution processor, based on leaky integrate and fire neurons (LIF) and a SpiNNaker board is presented. The network has been trained with Poker-DVS dataset in order to classify the four different card symbols. The spiking convolution processor extracts features from images in form of spikes, computes by one layer of 64 convolutions. These features are sent to an OKAERtool board that converts from AER to 2-7 protocol to be classified by a spiking neural network deployed on a SpiNNaker platform

    Comprehensive Evaluation of OpenCL-Based CNN Implementations for FPGAs

    Get PDF
    Deep learning has significantly advanced the state of the art in artificial intelligence, gaining wide popularity from both industry and academia. Special interest is around Convolutional Neural Networks (CNN), which take inspiration from the hierarchical structure of the visual cortex, to form deep layers of convolutional operations, along with fully connected classifiers. Hardware implementations of these deep CNN architectures are challenged with memory bottlenecks that require many convolution and fully-connected layers demanding large amount of communication for parallel computation. Multi-core CPU based solutions have demonstrated their inadequacy for this problem due to the memory wall and low parallelism. Many-core GPU architectures show superior performance but they consume high power and also have memory constraints due to inconsistencies between cache and main memory. OpenCL is commonly used to describe these architectures for their execution on GPGPUs or FPGAs. FPGA design solutions are also actively being explored, which allow implementing the memory hierarchy using embedded parallel BlockRAMs. This boosts the parallel use of shared memory elements between multiple processing units, avoiding data replicability and inconsistencies. This makes FPGAs potentially powerful solutions for real-time classification of CNNs. In this paper both Altera and Xilinx adopted OpenCL co-design frameworks for pseudo-automatic development solutions are evaluated. A comprehensive evaluation and comparison for a 5-layer deep CNN is presented. Hardware resources, temporal performance and the OpenCL architecture for CNNs are discussed. Xilinx demonstrates faster synthesis, better FPGA resource utilization and more compact boards. Altera provides multi-platforms tools, mature design community and better execution times.Ministerio de Economía y Competitividad TEC2016-77785-

    Event-based Row-by-Row Multi-convolution engine for Dynamic-Vision Feature Extraction on FPGA

    Get PDF
    Neural networks algorithms are commonly used to recognize patterns from different data sources such as audio or vision. In image recognition, Convolutional Neural Networks are one of the most effective techniques due to the high accuracy they achieve. This kind of algorithms require billions of addition and multiplication operations over all pixels of an image. However, it is possible to reduce the number of operations using other computer vision techniques rather than frame-based ones, e.g. neuromorphic frame-free techniques. There exists many neuromorphic vision sensors that detect pixels that have changed their luminosity. In this study, an event-based convolution engine for FPGA is presented. This engine models an array of leaky integrate and fire neurons. It is able to apply different kernel sizes, from 1x1 to 7x7, which are computed row by row, with a maximum number of 64 different convolution kernels. The design presented is able to process 64 feature maps of 7x7 with a latency of 8.98 s.Ministerio de Economía y Competitividad TEC2016-77785-
    corecore